FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads
نویسندگان
چکیده
Originally, MapReduce implementations such as Hadoop employed First In First Out (fifo) scheduling, but such simple schemes cause job starvation. The Hadoop Fair Scheduler (hfs) is a slot-based MapReduce scheme designed to ensure a degree of fairness among the jobs, by guaranteeing each job at least some minimum number of allocated slots. Our prime contribution in this paper is a different, flexible scheduling allocation scheme, known as flex. Our goal is to optimize any of a variety of standard scheduling theory metrics (response time, stretch, makespan and Service Level Agreements (slas), among others) while ensuring the same minimum job slot guarantees as in hfs, and maximum job slot guarantees as well. The flex allocation scheduler can be regarded as an add-on module that works synergistically with hfs. We describe the mathematical basis for flex, and compare it with fifo and hfs in a variety of experiments.
منابع مشابه
CIRCUMFLEX: A Scheduling Optimizer for MapReduce Workloads Involving Shared Scans
We consider MapReduce clusters designed to support multiple concurrent jobs, concentrating on environments in which the number of distinct datasets is modest relative to the number of jobs. Many datasets in such scenarios will wind up being scanned by multiple concurrent Map phase jobs. As has been noticed previously, this scenario provides an opportunity for Map phase jobs to cooperate, sharin...
متن کاملMROrchestrator: A Fine-Grained Resource Orchestration Framework for Hadoop MapReduce
Efficient resource management in data centers and clouds running large distributed data processing frameworks like Hadoop is crucial for enhancing the performance of hosted MapReduce applications, and boosting the resource utilization. However, existing resource scheduling schemes in Hadoop allocate resources at the granularity of fixed-size, static portions of the nodes, called slots. A slot r...
متن کاملNovel Scheduling Algorithms for Efficient Deployment of MapReduce Applications in Heterogeneous Computing Environments
Cloud computing has become increasingly popular model for delivering applications hosted in large data centers as subscription oriented services. Hadoop is a popular system supporting the MapReduce function, which plays a crucial role in cloud computing. The resources required for executing jobs in a large data center vary according to the job type. In Hadoop, jobs are scheduled by default on a...
متن کاملStatistical Workloads for Energy Efficient MapReduce
Energy efficiency is a growing concern in modern datacenters. As Internet services increasingly rely on MapReduce workloads to fuel their flagship businesses, there is a growing need for better MapReduce energy efficency evaluation mechanisms. We present a statistics-driven workload generation framework that distills summary statistics from production MapReduce traces and realistically reproduc...
متن کاملBudget based dynamic slot allocation for MapReduce clusters
MapReduce is one of the programming models for processing large amount of data in cloud where resource allocation is one of the research areas since it is responsible for improving the performance of Hadoop. However the resource allocation can be further improved by focusing on a set of mechanisms, that includes the budget based HFS algorithm where the fast worker node is identified first based...
متن کامل